Semi-Supervised AUC Optimization without Guessing Labels of Unlabeled Data

نویسندگان

  • Zheng Xie
  • Ming Li
چکیده

Semi-supervised learning, which aims to construct learners that automatically exploit the large amount of unlabeled data in addition to the limited labeled data, has been widely applied in many real-world applications. AUC is a well-known performance measure for a learner, and directly optimizing AUC may result in a better prediction performance. Thus, semi-supervised AUC optimization has drawn much attention. Existing semi-supervised AUC optimization methods exploit unlabeled data by explicitly or implicitly estimating the possible labels of the unlabeled data based on various distributional assumptions. However, these assumptions may be violated in many real-world applications, and estimating labels based on the violated assumption may lead to poor performance. In this paper, we argue that, in semi-supervised AUC optimization, it is unnecessary to guess the possible labels of the unlabeled data or prior probability based on any distributional assumptions. We analytically show that the AUC risk can be estimated unbiasedly by simply treating the unlabeled data as both positive and negative. Based on this finding, two semi-supervised AUC optimization methods named SAMULT and SAMPURA are proposed. Experimental results indicate that the proposed methods outperform the existing methods.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptively Unified Semi-supervised Learning for Cross-Modal Retrieval

Motivated by the fact that both relevancy of class labels and unlabeled data can help to strengthen multi-modal correlation, this paper proposes a novel method for cross-modal retrieval. To make each sample moving to the direction of its relevant label while far away from that of its irrelevant ones, a novel dragging technique is fused into a unified linear regression model. By this way, not on...

متن کامل

Learning with Unlabeled Data

of thesis entitled: Learning with Unlabeled Data Submitted by XU, Zenglin for the degree of Doctor of Philosophy at The Chinese University of Hong Kong in January 2009 We consider the problem of learning from both labeled and unlabeled data through the analysis on the quality of the unlabeled data. Usually, learning from both labeled and unlabeled data is regarded as semi-supervised learning, w...

متن کامل

Neural Simpletrons - Minimalistic Probabilistic Networks for Learning With Few Labels

Classifiers for the semi-supervised setting often combine strong supervised models with additional learning objectives to make use of unlabeled data. This results in powerful though very complex models that are hard to train and that demand additional labels for optimal parameter tuning, which are often not given when labeled data is very sparse. We here study a minimalistic multi-layer generat...

متن کامل

Semi-supervised Gaussian Process Ordinal Regression

Ordinal regression problem arises in situations where examples are rated in an ordinal scale. In practice, labeled ordinal data are difficult to obtain while unlabeled ordinal data are available in abundance. Designing a probabilistic semi-supervised classifier to perform ordinal regression is challenging. In this work, we propose a novel approach for semi-supervised ordinal regression using Ga...

متن کامل

Does Unlabeled Data Provably Help? Worst-case Analysis of the Sample Complexity of Semi-Supervised Learning

We study the potential benefits of unlabeled data to classification prediction to the learner. We compare learning in the semi-supervised model to the standard, supervised PAC (distribution free) model, considering both the realizable and the unrealizable (agnostic) settings. Roughly speaking, our conclusion is that access to unlabeled samples cannot provide sample size guarantees that are bett...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017